The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability

نویسندگان

  • Mark A. Pitt
  • Keith Johnson
  • Elizabeth Hume
  • Scott Kiesling
  • William D. Raymond
چکیده

This paper describes the Buckeye corpus of spontaneous American English speech, a 307,000-word corpus containing the speech of 40 talkers from central Ohio, USA. The method used to elicit and record the speech is described, followed by a description of the protocol that was developed to phonemically label what talkers said. The results of a test of labeling consistency are then presented. The corpus will be made available to the scientific community when labeling is completed. 2004 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An analysis of transcription consistency in spontaneous speech from the buckeye corpus

We present a preliminary analysis of transcriber consistency in labeling and segmentation of words and phones in the Buckeye corpus of spontaneous, informal speech. We find that pairwise inter-transcriber agreement on exact phone label match was 76%, and segmentation agreement within 20% of phone pair length was 75%, though longer phones are more consistently segmented than shorter phones. Patt...

متن کامل

Naïve listeners’ prominence and boundary perception

This paper examines how ordinary listeners, naïve with respect to the phonetics and phonology of prosody, perceive the location of prosodic boundaries that demarcate speech “chunks” and prominences that serve a “highlighting” function, in spontaneous speech (Buckeye corpus). Over 70 naïve listeners marked the locations of prominences and boundaries in a real-time transcription task. Fleiss’ mul...

متن کامل

The buckeye corpus of speech: updates and enhancements

This paper describes recent progress in the development of the Buckeye Corpus of Speech, a phonetically labeled corpus of conversational American English speech, first described in [1]. With the publication of the second phase of transcription, the corpus has nearly doubled in size from the first release. We briefly give an overview of the corpus, report on additional studies of inter-labeler a...

متن کامل

Aligning phonetic transcriptions with their citation forms

One of the main motivations for publishing this paper is to make available a matrix of phone-distance measures which may be useful in dealing with large corpora of conversational speech. The paper reports how this matrix of phone-distances was created from transcriber labeling disagreements, and how it can be used in a dynamic time warping algorithm to align phonetic transcriptions of conversat...

متن کامل

Prosody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction

Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2005